Thanks! My program effectively got a 33x* speedup: 0.037 vs 1.25 ms.
I threw everything into a Makefile script. It compiles your code, locates necessary symbols, and links the program. I've put it at
https://gist.github.com/ktossell/10439006 . Currently it's only set up for building programs that run from the internal RAM.
Does this linker script look all right?
/* autogenerated linker script for myprog, thread 3
* source files: main.c leds.c
*/
-c
-heap 15700000
-stack 0x800
_ClearBit = 0x10010f28;
_SetBit = 0x10010d20;
_printf = 0x80023140;
_sinf = 0x1000ecd4;
MEMORY {
IRAM: o = 0x1001c000, l = 0x00004000
THREAD_MEM: o = 0x80070000, l = 0x00010000
SDRAM: o = 0x80100000, l = 0x00f00000
}
SECTIONS {
.placeholder: palign(8), fill = 0xaaaaaaaa {. += 4;} > THREAD_MEM
.text > IRAM
.far > IRAM
.const > IRAM
}
I didn't know if the other sections were necessary for user programs. This seems to cover everything my program needs.
[*] where 3/4 of the 1.25ms was the program waiting for the thread to be rescheduled; actual running time was probably more like 0.4 ms, so a ~10x speedup.
---In DynoMotion@yahoogroups.com, <tk@...> wrote :
Hi,
I don't know of anyone using the TI Optimizing
Compiler. I don't think a gcc compiler exists for the TI C6722 processor. This is why we created the TCC67 compiler from TCC.
Code executes much faster using the (slow/expensive) TI Optimizing compiler. But most User programs just make function calls to KFLOP's internal optimized routines so it doesn't make much difference.
Besides the Optimizing compiler what can speed up code a lot is
placing it in Internal DSP RAM. There is only 128KBytes of Internal RAM but it is really fast (single cycle and 256 bits wide).
I think you will need to make a .cmd file for the TI Compile/Link to place the code in the right place.
I made an example. See the attached Zip file (you will need to re-name it). UnZip it as a directory TI_Compiler under the C Programs folder.
There is a Batch file and some linker files to compile an example to be run as Thread#2.
Unfortunately we don't have an easy way to import the symbols into the TI
compiler from the KFLOP Binary the way we have for TCC67. But if you only need to access a few functions in KFLOP you can hard code them in the linker file.
There is a procedure to follow included. It is attached separately as well.
Check out the
Video !
Regards
TK
Group: DynoMotion |
Message: 9403 |
From: Tom Kerekes |
Date: 4/12/2014 |
Subject: Re: Linking in code compiled with cl6x or gcc |
Nice. Wow you are the Master of makefiles!
It looks correct to me. I think those are all the sections needed for simple C code.
I don't see why you say only a 10X speedup. Both methods should only get about 1/4 of the CPU.
Thanks, TK
Group: DynoMotion |
Message: 9404 |
From: k_dm927 |
Date: 4/12/2014 |
Subject: Re: Linking in code compiled with cl6x or gcc |
Yeah, I'm not sure whether I'm thinking of the speedup in a useful way. My thoughts were:
My code has a 1KHz loop. It calls WaitUntil(x), incrementing x by 0.001 each time, so the thread should wake up as soon as possible after each 1ms period ends. This period might end while the thread is scheduled, or it might end somewhere in the remaining 72% of the time.
Built with TCC, the code takes 1250 us of real time to get from "gather inputs" to "command axes". In 1250 us, the KFLOP will have gone through its servo-system-servo-user cycle about 7 times, during which the user thread will have run for about 7 x 50us = 350us. But when the optimized code is running, there's a good chance that the 37us it needs will fit inside a single 50us thread scheduling period (because the timer probably expired before the thread was called, and the thread can wake up immediately once it's scheduled).
The ratio in real time is 1250/37 = 33.8, but the ratio in time spent actively calculating my outputs is 350/37 = 9.5.
|
|
Group: DynoMotion |
Message: 9406 |
From: Tom Kerekes |
Date: 4/12/2014 |
Subject: Re: Linking in code compiled with cl6x or gcc |
Oops. I think you are exactly correct. I misread your 0.037 vs 1.25 ms. backwards as 0.037 seconds vs 1.25 milliseconds.
BTW it looks like your makefile specifies -O2 for optimization. I always use -O3. Have you tried -O3? I suppose it wouldn't actually matter to you since your calculation completes in one time slice anyway.
Regards TK
| | | |